feat: support Spark expression json_array_length#4365
Conversation
This reverts commit 768b3e9.
|
|
||
| fn spark_json_array_length_array(array: &ArrayRef) -> Result<ArrayRef> { | ||
| match array.data_type() { | ||
| DataType::Utf8 => { |
There was a problem hiding this comment.
Do we need to handle other variants like LargeUtf8?
| } | ||
|
|
||
| fn get_json_array_length(json_str: &str) -> Option<i32> { | ||
| match serde_json::from_str::<serde_json::Value>(json_str) { |
There was a problem hiding this comment.
This is parsing and materializing the whole array just to get the length. Spark avoids this with a streaming parser, so our performance here may not be great, and we will likely use more memory than Spark.
Maybe you could look into streaming options with serde_json?
| object CometLengthOfJsonArray | ||
| extends CometScalarFunction[LengthOfJsonArray]("json_array_length") { | ||
|
|
||
| private val IncompatibleReason: String = |
| -- specific language governing permissions and limitations | ||
| -- under the License. | ||
|
|
||
| -- Config: spark.comet.expression.LengthOfJsonArray.allowIncompatible=false |
There was a problem hiding this comment.
shouldn't this be setting it to true?
| ('{"arrays": {"first": [1,2], "second": [3,4,5]}}'), | ||
| ('[{"arr": [1,2,3]}, {"arr": [4,5]}]') | ||
|
|
||
| query spark_answer_only |
There was a problem hiding this comment.
The expression is marked as incompatible, so these tests are not testing the Comet expression. I suggest enabling allowIncompatible=true in these tests and remove spark_answer_only to confirm they are running in Comet
There was a problem hiding this comment.
If set allowIncompatible=true we couldn't check fallback reasons, moved this test to separate file
| val exprProto = exprToProtoInternal(lengthOfJsonArray, inputs, binding) | ||
| if (exprProto.isEmpty) { | ||
| lengthOfJsonArray | ||
| .getTagValue(CometExplainInfo.EXTENSION_INFO) |
There was a problem hiding this comment.
EXTENSION_INFO was renamed to FALLBACK_REASONS in main branch recently
andygrove
left a comment
There was a problem hiding this comment.
Thanks @kazantsev-maksim. LGTM pending merging latest from apache/main to pick up the rename of EXTENSION_INFO to FALLBACK_REASON and also pending CI.
|
@andygrove thanks for the review, fixed conflict with FALLBACK_REASON |
|
Merged. Thanks @kazantsev-maksim! |

Which issue does this PR close?
Closes #.
Rationale for this change
Add native support for Spark's json_array_length(jsonArray) expression so it runs on Comet instead of falling back to Spark.
What changes are included in this PR?
How are these changes tested?
New SQL test file spark/src/test/resources/sql-tests/expressions/json/json_array_length.sql